Point-Based Policy Iteration

نویسندگان

  • Shihao Ji
  • Ronald Parr
  • Hui Li
  • Xuejun Liao
  • Lawrence Carin
چکیده

We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of initial belief states, and decrease for none of these states. In contrast, PBVI cannot guarantee monotonic improvement of the value function or the policy. In practice PBPI generally needs a lower density of point coverage in the simplex and tends to produce superior policies with less computation. Experiments on several benchmark problems (up to 12,545 states) demonstrate the scalability and robustness of the PBPI algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A full Nesterov-Todd step interior-point method for circular cone optimization

In this paper, we present a full Newton step feasible interior-pointmethod for circular cone optimization by using Euclidean Jordanalgebra. The search direction is based on the Nesterov-Todd scalingscheme, and only full-Newton step is used at each iteration.Furthermore, we derive the iteration bound that coincides with thecurrently best known iteration bound for small-update methods.

متن کامل

Robust partially observable Markov decision process

We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to ...

متن کامل

New three-step iteration process and fixed point approximation in Banach spaces

‎In this paper we propose a new iteration process‎, ‎called the $K^{ast }$ iteration process‎, ‎for approximation of fixed‎ ‎points‎. ‎We show that our iteration process is faster than the existing well-known iteration processes using numerical examples‎. ‎Stability of the $K^{ast‎}‎$ iteration process is also discussed‎. ‎Finally we prove some weak and strong convergence theorems for Suzuki ge...

متن کامل

Solving time-fractional chemical engineering equations by modified variational iteration method as fixed point iteration method

The variational iteration method(VIM) was extended to find approximate solutions of fractional chemical engineering equations. The Lagrange multipliers of the VIM were not identified explicitly. In this paper we improve the VIM by using concept of fixed point iteration method. Then this method was implemented for solving system of the time fractional chemical engineering equations. The ob...

متن کامل

Policy Iteration in Finite Templates Domain

We prove in this paper that policy iteration can be generally defined in finite domain of templates using Lagrange duality. Such policy iteration algorithm converges to a fixed point when for very simple technique condition holds. This fixed point furnishes a safe over-approximation of the set of reachable values taken by the variables of a program. We prove also that policy iteration can be ea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007